Lab 4: Spatial Wrangling


1 OVERVIEW


1.1 Learning Objectives

The aim of this week’s lab is to get comfortable reading in and managing spatial datasets and the SP/SF packages

SEE CANVAS FOR SUBMISSION DATES.

1.2 Get help

If a link to a tutorial is broken, you should be able to go to the tutorial number and find it in the menu.

Teams is the fastest way to get help. CLICK THIS LINK FOR THE TEAMS WEBSITE FOR LAB HELP


2 LAB SET-UP


2.1 Create a project


2.2 Use a template

You are welcome to use your own template, but I suggest for ease using one of the professional ones, such as PACKAGE rmdformats or PACKAGE prettydoc. To use these,

  • (if you have not already) click on the packages tab, then the install button. Install the rmdformats package and the prettydoc package.

  • Same as normal, go to File|New File|R Markdown. But NOW, click on the templates button on the left.

  • You will see a whole load of templates from the different packages. Each will give you professional formatting for very little work. To explore what they look like without having to try each one, google the websites for rmdformats, prettydoc and others..

  • To see what a template looks like, choose it, press OK, then press knit.

  • Choose your favourite. Finally, remember to add in the title and author lines at the top of your Rmd file. For example here is the final YAML for this script.


2.3 Add libraries & code options

Edit the first “set-up” code chunk so it looks like this and run/knit to load. You might need additional libraries as you work through the lab. If so, add them in this code chunk AND REMEMBER TO RERUN. If you’re template didn’t have a “setup” code chunk, just create one at the top.

If you see a little yellow bar at the top asking you to install them,click yes!

knitr::opts_chunk$set(cache = TRUE,message=FALSE,warning=FALSE,echo=TRUE)

# LIBRARIES
library(tidyverse)
library(dplyr)
library(ggpubr)
library(skimr)
library(ggplot2)
library(plotly)
library(knitr)
library(raster)
library(sp)
library(sf)
library(tmap)
library(terra)
library(palmerpenguins) #you might need to install this
library(rnaturalearth)



3 CODE SHOWCASE

3.1 Spatial Data Wrangling in Data Camp [20 MARKS]

Create a code showcase section in your lab report. Complete CHAPTER 1 AND CHAPTER 2 of this datacamp course on spatial data in R. Include a screenshot in your lab report to show you did it.

See Canvas for how to access data camp for free.



3.2 Markdown and inline code

3.2.1 Learn about inline code

One of the best parts of R markdown is that you can embed code in your actual report text. So imagine for example, you had written about the mean car year in Lab 1 then realised you had made a mistake. Rather than have to change the answer in your write up, you can include “inline” code which will auto-update with as you fix the mistake.

To see it in action and learn how to do it, follow these three links:


### Test your learning [10 MARKS]

  1. Make a new code chunk and load the penguins dataset (from the package palmerpenguins).
data(penguins, package="palmerpenguins")
  1. Click on penguins in the environment tab to view the data. More details here: https://allisonhorst.github.io/palmerpenguins/

  2. In the same code chunk, find the MEAN flipper length and the MAXIMUM body mass, and save to variables called flipper and mass.

    1. HINT, there is missing data in these columns, to ignore it use na.rm=TRUE (see here https://www.statology.org/na-rm/ )
  3. In the code chunk options the { r } bit, add include=FALSE so that the code chunk is invisible. When you press knit, it should look like nothing has happened.

  4. Finally, below the code chunk, write this sentence, using inline code to replace the XXXX/YYYY with the actual average flipper length / max body mass.


    In the Palmer Penguins dataset, the mean flipper length is XXXX mm and the maximum body mass is YYYY g.



4 MAIN ANALYSIS

4.1 Aim of the analysis

You are writing a report for Dr Sara Stoudt at the department of Maths in Bucknell University (for real - I am going to share these with her). Dr Stoudt is a spatial statistician who focuses on the analysis of crowd-sourced datasets. https://www.inaturalist.org/ and https://journeynorth.org. Have a look at her bio here: https://sastoudt.github.io/.

Specifically, we will be conducting an analysis on a new crowd-sourced dataset that she has never seen before (again for real), a crowd-sourced dataset on fireflies. As before, with the entire report below, we will be grading on the professionality of your output.

4.2 Report set-up

Create these headings and sub-headings in your report.

  • BACKGROUND ON CROWD-SOURCED DATA
    • What is crowd-sourced data?
    • What are its strengths?
    • What are its weaknesses?
    • How do spatial fallacies impact it?
  • STUDY SUMMARY
    • Fireflies
    • The Firefly Watch project
  • DATA DESCRIPTION
  • DATA WRANGLING
  • SPATIAL ANALYSIS
  • MAPS
  • ANALYSIS

For example, when you press knit, it should look something like this:

4.3 Background on crowd sourced data.

Although GEOG-364 is not a writing course, it is important to be able to describe the data and topics you are covering. We are grading you on content not on grammar. Crowd sourced data is an important type of data which will only grow in popularity, so it’s important to understand its strengths and weaknessses.

First, BRIEFLY skim read these three papers. I start by reading the abstracts and sub-headings then zoom in where I find interesting.

4.3.1 what to write up

In the appropriate section of your report and referring to the sources above (plus other documents as you wish), write:

  1. One paragraph on what crowd-sourced nature data is [5 marks]

  2. One paragraph on the strengths and opportunities for crowd sourced data in understanding the natural world around us [5 marks]

  3. One paragraph on the weaknesses for crowd sourced data in understanding the natural world around us [5 marks]

  4. One paragraph where you use the course notes to explain how the non-uniformity of space and the locational fallacy might impact some of these datasets [5 marks]

4.4 Background on fireflies

firefly <- readxl::read_excel("fireflydata.xlsx")

Fireflies are well loved insects, yet we don’t actually have a map of where they are - or know if they are declining or increasing. For example, we don’t know how climate change, pesticides or light population are affecting their numbers.

Refresh your knowledge on fireflies (these are just ideas.. spend 5-10mins on this max)

To gain more data, a group of researchers started a citizen-science project called Firefly Watch where people could submit their firefly observations. See more here:

https://www.massaudubon.org/get-involved/community-science/firefly-watch

The aim of this lab is to see if this crowd sourced dataset can show/explain spatial patterns in reported sightings of fireflies/lightning bugs.

4.4.1 what to write up

In the appropriate sections of the Study Summary part of your report,

  • Introduce fireflies as a topic and explain why we might want to map them, summarising a few facts about fireflies from your reading.

  • Introduce the Firefly Watch Study

There is a spell check next to the knit button at the top of the script. Press knit regularly to check it all looks good

4.5 Data Analysis [10 MARKS]

4.5.1 Data description

  1. Go to the Canvas Lab 4 page and download the dataset (“firefly.xlsx”).

  2. Put it in your lab 4 project folder (or use the upload button in R-studio cloud to put it in your project)

  3. Use The Input/Output tutorial 61 to read the data into R and save as a variable called firefly. (hint, readxl package)

  4. View/summarise the data and get comfortable with it. You could use some summary statistics from Summary Stats Tutorial 8. You do not need to include them all..

  5. Write these details as a bullet point list in your data description section:

    • Object of analysis of the dataset

    • Population of the dataset (e.g. boundary in time and space)

    • Variables in the dataset

    • Number of objects/rows

    • Which years do we have data for? How many observations in each year? (hint, apply the table() command to the Year column of the firefly dataset)

    • Is Pennsylvania included in the dataset? How many observations were taken in PA?

    • In a new paragraph, explain if you think the firefly data is marked, and if so, give an example of a mark.

See if you can include the numbers as inline code rather than typing them..

HINT, there is not one row for every firefly that has ever existed in the USA.. think about what each row is

HINT 2, WE SHOULD BE ABLE TO SEE IN YOUR CODE WHERE YOU GOT EACH ANSWER (e.g. leave your code visible)


4.5.2 Data wrangling

In the data wrangling section, use either this tutorial FILTERING Tutorial 7D or or https://crd150.github.io/lab2.html#Filtering to help you complete these tasks

  1. Use R-code to find the value of the second row and the 4th column in your data

  2. In the MAIN firefly data, if you look closely at your summary, you might find there are some unusual temperature values.
    .Let’s assume that the temperature of 8000F is not likely to be true. Filter the data so that the temperature is below 200F and overwrite (e.g. save the result as a variable called firefly)

4.5.3 Making your data spatial

If your datacamp/training gives you a better way of approaching this code, go for it!

These tutorials should help Tutorial 11A, along with the data camp course. Tutorial 11B

The firefly data is in standard lat/lon, so EPSG=4326.

Use your own knowledge or these instructions to Make a sf version of your firefly data and assign it to a variable called firefly.sf. You can leave it in lon/lat/4326 this lab.


  1. Use Tutorial 11Bc to load rnaturalearth state-boundaries for US States. Assign to a variable called states.sf and use st_transform to convert to projection 4326.


4.6 Making maps [5 MARKS]

Let’s now see how our data looks plotted. I have provided a few examples. Your job is to get them running and interpret them.

4.6.0.1 Make a basic plot

In a new code chunk enter the following code. You should see a basic plot with the firefly locations and the state borders. If so, congrats! If not, you need to adjust your projections or something has happened.

plot(st_geometry(firefly.sf),
     pch=16,
     col=rgb(0,0,1,.5),
     cex=.5,
     main="Firefly locations")

plot(st_geometry(states.sf),add=TRUE)

]Recreate this plot in your report. Google the rgb() command and edit your plot so that the points are semi-transparent purple. (hint https://www.r-graph-gallery.com/43-rgb-colors.html )

Hint, you can also use tmap and QTM from the previous labs to explore the data.

4.6.0.2 Make a more detailed tmap plot

The plot above is still pretty basic, so lets explore another of the big packages available to let you make spatial visualisations. We’re going to extend your knowledge of tmap.

Look at the command below, you can see that we’re building a series of layers linked by the + symbol.

tmap_mode("plot")                             # Set the static plot mode

myplot <- tm_shape(firefly.sf) +               # Load the firefly data 
          tm_dots(col="black", size=0.05) +    # Plot it as dots
          tm_shape(states.sf) +                # Load the state borders
          tm_borders(lwd=.5)                   # Plot them as just borders
  
myplot

I have saved it as a variable called myplot and printed its name so that it’s saved in R. This means I can now turn on the interactive view mode and re-plot

tmap_mode("view")
myplot

and back to static:

tmap_mode("plot")
myplot

Get the tmap plots working in your lab-script. If the interactive one crashes knit, you don’t need to include it.

4.7 Discussion [10 MARKS]

  • Explore and summarise any spatial patterns you see in the data.

  • Comment if the data appears to follow Tobler’s law at this scale

  • One commmon theory is that fireflies are only seen in areas with certain temperatures and elevations. Another is that firefly numbers are linked to light pollution and cities.
    Change the base-maps in the INTERACTIVE MAP (there’s a buttom that lets you change the layer on the map) to explore and discuss these theories . For example the terrain layer gives a sense of elevation and the open maps layer will let you see where the cities are.. You do not need to do any additional formal analysis.

We will return to fireflies in future labs..

5 ABOVE & BEYOND

In the plot commands above, we used st_geometry(). Explain what it does and why it is useful (2), with an additional (2) for turning it on and off


6 SUBMITTING YOUR LAB

Remember to save your work throughout and to spell check your writing (next to the save button). Now, press the knit button again. If you have not made any mistakes in the code then R should create a html file inyour Lab folder which includes your answers.

6.1 On laptops

  • Go to your 364 folder and look for your lab folder.

  • You should see a .html file (edge/chrome) complete with a very recent time-stamp.html file along with a .rmd file with a recenttime-stamp

  • In that folder, double click on the html file. This will open it inyour browser. CHECK THAT THIS IS WHAT YOU WANT TO SUBMIT.

  • Now go to Canvas and submit BOTH your html and your .Rmd file.



6.2 On R studio cloud

  • If you are on R studio cloud, see XXXXX for how to download your files


  • Now go to Canvas and submit BOTH your html and your .Rmd file.



7 CHECK THIS BEFORE YOU SUBMIT!

People who use this section get better grades…

7.1 Predict your grade

Here is LITERALLY how we are grading you. Predict your grade!

HTML FILE SUBMISSION - 8 marks

RMD CODE SUBMISSION - 8 marks

MARKDOWN/CODE STYLE - 10 MARKS

Your code and document is neat and easy to read. LOOK AT YOUR HTML FILE IN YOUR WEB-BROWSER BEFORE YOU SUBMIT. There is also a spell check next to the save button. You have written your answers below the relevant code chunk in full sentences in a way that is easy to find and grade. For example, you have written in full sentences, it is clear what your answers are referring to.

CODE SHOWCASE - 30 MARKS

  • 20 for data camp
  • 10 for the inline code

DATA ANALYSIS - 40 MARKS

-15 for your study design/backgrounds -10 for your data description and filtering -5 for the mapping -10 for the thoughtfulness/quality of your discussion

Above and beyond: 4 MARKS

See above

[100 marks total]

7.2 What your grade means

Why is 100% hard? Overall, here is what your lab should correspond to:

Grade % Mark Rubric
A* 98-100 Exceptional.  Not only was it near perfect, but the graders learned something.  THIS IS HARD TO GET.
NA 96+ You went above and beyond
A 94+: Everything asked for with high quality.   Class example
A- 90+ The odd minor mistake, All code done but not written up in full sentences etc. A little less care
B+ 87+ More minor mistakes.  Things like missing units, getting the odd question wrong, no workings shown
B 84+ Solid work but the odd larger mistake or missing answer.  Completely misinterpreted something, that type of thing
B- 80+ Starting to miss entire/questions sections, or multiple larger mistakes. Still a solid attempt. 
C+ 77+ You made a good effort and did some things well, but there were a lot of problems. (e.g. you wrote up the text well, but messed up the code)
C 70+ It’s clear you tried and learned something.  Just attending labs will get you this much as we can help you get to this stage
D 60+ You attempt the lab and submit something. Not clear you put in much effort or you had real issues
F 0+ Didn’t submit, or incredibly limited attempt.